Detection Strategies for Extreme Mass Ratio Inspirals 
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The capture of compact stellar remnants by galactic black holes provides a unique laboratory for 
exploring the near horizon geometry of the Kerr spacetime, or possible departures from general rela- 
tivity if the central cores prove not to be black holes. The gravitational radiation produced by these 
Extreme Mass Ratio Inspirals (EMRIs) encodes a detailed map of the black hole geometry, and the 
detection and characterization of these signals is a major scientific goal for the LISA mission. The 
waveforms produced are very complex, and the signals need to be coherently tracked for hundreds to 
thousands of cycles to produce a detection, making EMRI signals one of the most challenging data 
analysis problems in all of gravitational wave astronomy. Estimates for the number of templates 
required to perform an exhaustive grid-based matched-filter search for these signals are astronomi- 
cally large, and far out of reach of current computational resources. Here a hierarchical approach to 
the EMRI detection problem is developed that employs a directed-stochastic search technique. The 
algorithm, dubbed Metropolis Hastings Monte Carlo (MHMC), is closely related to Markov Chain 
Monte Carlo and genetic algorithms. The utility of the MHMC approach is demonstrated using 
simulated data sets from the Mock LISA Data Challenge. 



I. INTRODUCTION 

The capture of stellar remnants - white dwarfs, neutron 
stars and black holes - by the massive black holes that are 
thought to reside at the centers of most galaxies provide 
an excellent laboratory for performing precision tests of 
general relativity. The large discrepancy in the masses al- 
lows the smaller body to be treated as a perturbation to 
the spacetime of the galactic black hole, and the evolution 
of the system can be treated analytically. The gravita- 
tional wave signals from these systems encode effects such 
as frame dragging, periastron advance, and spin-orbit 
coupling in highly modulated waveforms. The detection 
and characterization of these Extreme Mass Ratio Inspi- 
rals (EMRIs) is a key science goal of future space based 
gravitational wave detectors such as LISA [1|. Finding 
EMRI signals in the output of a noisy detector presents a 
challenging data analysis problem as the signals have to 
be followed for several hundred cycles in order to accumu- 
late sufficient signal-to- noise ratio (SNR) for detection. It 
has been estimated [2[ that it would take of order 10 40 
templates to perform an exhaustive matched-filter search 
for these signals. We either have to hope for a major ad- 
vance in computing, or look for sub-optimal techniques 
to the EMRI detection problem. 

It is natural to consider hierarchical strategies that ei- 
ther work with some subset of the data (for example, 
smaller time segments), or coarser parameter search grids 
that are then refined in the regions surrounding candidate 
detections. In Ref. [2| a stack-slide [3| search algorithm 
was put forward that combines both of these strategies, 
and it was estimated that with year 2013 computing re- 
sources it would be possible to detect systems with SNRs 
greater than 30, which is about a factor of two worse 
than could be done with a fully coherent search. Imple- 
menting the stack-slide algorithm is a non-trivial task, 
but it would be interesting to see how it performs on 
the simulated EMRI data sets that have been produced 
for the Mock LISA Data Challenges (MLDCs) 0. An- 



other approach to the EMRI detection problem is to use 
time- frequency techniques |5| to search for tracks in spec- 
trograms of the data. This approach has the advantage 
of being computationally cheap, and it has been applied 
with some success to the MLDC data sets Q, but there 
are concerns about how it will perform when applied to 
more realistic data containing the signals from millions 
of galactic binaries, multiple massive black hole binaries 
and hundreds of EMRIs. 

The Montana Gravitational Wave Group has been in- 
volved in a program to develop a Bayesian approach to 
LISA data analysis @, S, ©, ES El El (see Refs. 0E1 
Il5| | for a similar program applied to LIGO data analysis 
and LISA data analysis 16]). The Markov Chain Monte 
Carlo (MCMC) technique has been central to these stud- 
ies, and in particular, the Metropolis-Hastings implemen- 
tation of MCMC sampling. The goal is to compute the 
posterior distribution of the parameters that are used to 
describe the signals (e.g. the sky location) and the in- 
strument noise (e.g. the noise variance in some frequency 
window). The procedure for generating the Markov chain 
is simple: starting from some point x in parameter space, 
propose a move to a new point y; and based on the transi- 
tion probability (3 = min(l, H), either accept the move or 
stay at the current location. The new points are drawn 
from a proposal distribution q(-\x), and the transition 
probability is given by the Hastings ratio H [17J, which 
is equal to the product of posterior odds ratio and pro- 
posal odds ratio for the two points: 



H 



p(y)p(s\y)q(x\v) 

p(x)p(s\x)q(y\x) 



(1) 



Here p(x) and p(s\x) are the prior and likelihood at 
x. There are theorems that prove that the samples in 
the chain converge to the posterior (target) distribution 
for any non-pathological proposal distribution [18[. The 
rate of convergence to the target distribution ("burn- 
in time"), and the number of samples needed to accu- 
rately reconstruct the posterior ( "mixing time" ) depend 



on the particular implemention of the Metropolis Hast- 
ings algorithm being used, and on the nature of the tar- 
get distribution. For a wide class of algorithms it is 
possible to prove that the Markov chains produced by 
the Metropolis-Hastings algorithm are geometrically er- 
godic [191 ]. and for such chains there are theorems that 
provide bounds on the burn-in and mixing times - see 
Rcf. [23] for an accessible review. Unlike exhaustive grid 
searches, where the cost of the search scales geometrically 
with the dimension of the parameter space, the cost of 
a MCMC search is expected to grow as a polynomial in 
the dimension of the parameter space, thus escaping the 
curse of dimensionality. The reasoning behind this ex- 
pectation is that the MCMC algorithm ignores regions 
with low probability (most of the parameter volume), 
and focuses on peaks of the posterior distribution. Poly- 
nomial time scaling has been proven in several instances, 
including random walk Metropolis-Hastings algorithms 
applied to log-concave b 20] and truncated multivariate 
Gaussian [21( target distributions. 

The theorems that underly the MCMC approach are 
fine in theory, but in practical applications a poor choice 
of proposal distribution, starting point or algorithm im- 
plementation can lead to extremely long convergence 
times. A wide variety of techniques have been developed 
to deal with these issues (see Ref. [22J for a review) , and 
many have been applied to gravitational wave data anal- 
ysis studies. When choosing a proposal distribution it is 
a good idea to use a mixture of distributions, some that 
typically propose large jumps, and others that typically 
propose small jumps. In many instances the parame- 
ters are highly correlated in terms of their effects on the 
waveforms, and better acceptance rates can be obtained 
by jumping along eigendirections of the Fisher informa- 
tion matrix, with jumps scaled by the eigenvalues. In this 
context the Fisher information matrix provides an esti- 
mate of the local Gaussian curvature of the posterior. In 
theory one would like to start the chain near the central 
mode of the posterior, but that requires a good initial 
guess for the source parameters. Absent such informa- 
tion the chain is started at some random location and 
the chain undergoes a "burn-in" phase while it searches 
for regions with significant posterior weight. 

Samples from the burn-in phase do not follow the sta- 
tionary distribution of the posterior, and must be dis- 
carded. But this also means that the usual rules that 
apply to Markov Chains - such as detailed balance and 
reversibility - can be ignored during the burn-in phase, 
and we are free to use adaptive proposal distributions, 
simulated annealing, deterministic hill-climbing etc. In 
the course of developing these techniques to speed the 
burn-in for galactic binary and massive black hole sig- 
nals we found that the resulting algorithms became so 
effective at locating the posterior mode that there was 
no need for a good initial guess. In other words, our 
MCMC codes for exploring the posterior had become 
fully fledged search codes. The search phase of the al- 
gorithm has been dubbed "Metropolis Hastings Monte 



Carlo" (MHMC) since the majority of the moves are 
Metropolis-Hastings transitions, but the resulting chain 
is not Markovian. The effectiveness of MHMC approach 
has now been demonstrated in the first two rounds of 
Mock LISA Data challenges [24j for both galactic bina- 
ries and massive black holes. It is natural to ask if the 
same approach may be effective when applied to the far 
more challenging problem of detecting EMRI signals, and 
two groups set out to answer this question. Here is the 
story so far of the Montana effort; the description of the 
other group's work can be found in Ref. [2a |. 



II. HUNTING EMRIS 

The simulated signals used in this study were gen- 
erated using the Barack-Cutler kludge waveforms [261 ]. 
While more realistic waveforms could have been used, 
the computation of highly accurate EMRI waveforms re- 
mains an open problem, so the MLDC taskforce chose 
to use the simplest waveform model available that cap- 
tures most of the essential physics of an extreme mass 
ratio inspiral. In particular, the Barack-Cutler kludge 
waveforms depend on the same 14 parameters as the full 
waveform, and include effects such as the evolution of 
the eccentricity and orbital phases, multiple harmonics, 
precession of the periapse, and precession of the orienta- 
tion of the orbital plane [26j. A full description of the 
waveform model can be found in Ref. [27[ • 

While the Barack-Cutler kludge waveforms are far sim- 
pler to generate than other EMRI waveform models, 
straightforward implementations, such as the one used 
to generate the MLDC data sets, take several minutes to 
produce a single two year long waveform. For those of us 
without access to supercomputer resources, the cost asso- 
ciated with computing the likelihood (one component of 
the Hastings ratio) would have put the EMRI search out 
of reach. Thus the first tasks was to come up with a bet- 
ter method for implementing the Barack-Cutler kludge. 
The computationally expensive steps in the waveform 
generation are (1) the numerical integration of the orbital 
phases (2) the evaluation of the Bessel functions that ap- 
pear in the amplitude of each harmonic (3) the computa- 
tion of the antenna patterns and (4) the computation of 
the sines and cosines of the orbital phases. The first thing 
to realize is that the quantities in parts (1), (2) and (3) all 
evolve on timescales much longer than the orbital period. 
Indeed, just a couple of thousand samples is sufficient to 
cover the entire two year data set. Intermediate values 
can be found using linear interpolation. There is no get- 
ting around the fast sampling timescale demanded by the 
sines and cosines of the orbital phases, but repeated calls 
to trigonometric subroutines can be avoided by using re- 
currence relations to compute the higher harmonics from 
the lower harmonics. Combining these strategies results 
in a code that is roughly three orders of magnitude faster 
than the one used by the MLDC taskforce, and puts the 
EMRI search back into the reach of a lone workstation. 



The MHMC algorithm used in the EMRI searches 
shares many features in common with the massive black 
hole and galactic binary algorithms 0, B S QlA LLll uM ■ 
The search employs a mixture of transition proposals: 
large and small jumps in one or all of the parameters 
based on uniform draws from the parameter priors; and 
normal jumps along local eigendirections of the Fisher 
Information Matrix (FIM). The FIM was only updated 
every few hundred steps as it requires 28 calls to the wave- 
form generation routine, and these calls are better spent 
in advancing the search chain. The version of the algo- 
rithm used in MLDC Challenges 1.3 and IB. 3 did not em- 
ploy any of the specially tailored "island hopping" jumps 
that promote movement between widely separated max- 
ima of the posterior [l(J El- As with all of our MHMC 
algorithms, simulated annealing was used to ensure com- 
plete coverage of the parameter ranges during the initial 
phase of the search (simulated annealing replaces the like- 
lihood p(s\x) by p(s\x) 1 / T , where the "temperature" T is 
started high and gradually reduced to unity.) The bur- 
den on the stochastic search procedure can be lessened by 
analytically solving for the distance to the source and the 
plunge time, and by maximizing over the phase of each 
harmonic. For a small additional computational cost this 
lessens the effective search dimension from 14 to 9. Un- 
fortunately the analytic maximization procedure did not 
work as well as expected, and in the end only the dis- 
tance to the source was handled analytically. It is hoped 
that the problem can be found and fixed in time for the 
next data challenge. 

The main new ingredient in the EMRI search is a hi- 
erarchical procedure that starts with small segments of 
the full data stream and gradually builds up to using the 
full signal. The first stage of the search divides the two- 
year data stream into 8 segments. The best fit solutions 
from the first stage are then used to start the search at 
the next level (4 segments) and so on. At each transi- 
tion simulated annealing is used to promote movement of 
the chains. This hierarchical approach has several advan- 
tages: it takes less time to generate the waveforms in the 
early stages, and because the search template does not 
have to match the signal for as many cycles, it is possible 
to get a good match with a wider range of parameters. 
For example, the ~ 3 month long first stage segments 
do not constrain the sky location very well. The MHMC 
approach is ideally suited to a hierarchical search as it 
automatically ignores parameters that are poorly deter- 
mined (jumps in these parameters directions are always 
accepted), and even a marginal signal detection is enough 
to move the chain into the right general area of parame- 
ter space, from where the solution can be refined as more 
data is incorporated. On the other hand, the best fit 
parameters derived from the short data segments can be 
very far from the injected values, and this can trap the 
chain in the wrong region of parameter space. 

The biggest problem encountered while testing on the 
MLDC training data was the huge number of secondary 
modes of the posterior. Very often the initial stage of the 
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FIG. 1: Scatter plots from the initial "random restart" phase 
of a search of MLDC 1.3.1 training data showing the maxi- 
mum SNR reached as a function of the initial azimuthal or- 
bital frequency. The vertical dotted line indicates the injected 
parameter value. 



search would lock onto a secondary mode, and despite 
vigorous application of simulated annealing, the chains 
would rarely transition to the primary mode as more 
data was incorporated. Similar problems had been en- 
countered in the massive black hole and galactic binary 
searches, and there the solution had been to introduce 
"island hopping" moves to promote movement between 
modes. The theory behind this strategy is that there 
is usually at least an approximate relationship between 
the location of the posterior modes. For example, for 
low frequency sources the sky position is bi-modal, and 
the two modes are at roughly antipodal points on the sky. 
Similarly, galactic binaries produce an island chain of sec- 
ondary maxima spaced in frequency by roughly 1/year. 
In other words, once one posterior mode has been lo- 
cated, you usually have at least a rough idea of where 
the other modes will be. The landscape of the EMRI 
posterior is only partially understood at this time, and 
the compressed calendar of the MLDC challenges did not 
allow time for island hopping moves to be developed. 
Instead a random restart approach was used, whereby 
dozens of short (~ 20, 000 point) search chains were run 
on each of the 3 month data segments (see Figure [!} , 
and three or four of the highest likelihood solutions from 
the first stage were then used as starting points for the 
(~ 100,000 point) second stage search (see Figure [5]). 
A combination of simulated annealing and thermostat- 
ing [!2] was used during the first two stages of the search. 
The initial temperature was set at T = 20, and pro- 
gressively reduced using a standard power law cooling 
schedule. The cooling was stopped if the temperature 
fell below T sta t = SNR 2 /100, and was then maintained 
at T s t a t for the remainder of the stage. Here SNR stands 
for the signal-to-noise ratio at the current location of the 




FIG. 2: Scatter plots from the second phase of a search 
of MLDC 1.3.1 training data showing the maximum SNR 
reached as a function of the initial azimuthal orbital frequency 
/, cosine of the angle A between the orbital angular momen- 
tum and the black hole spin, the spin magnitude x> an d the 
ecliptic longitude <j>. The vertical dotted line indicates the 
injected parameter values. 
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FIG. 3: Evolution of the signal-to-noise ratio during the four 
stages of a MHMC search applied to MLDC 1.3.1 training 
data. 



chain. This thermostating procedure ensures that the ef- 
fective SNR never exceeds 10, and helps to prevent the 
chains from getting stuck on local peaks of the posterior. 
During the final stages of the search thermostating was 
used for the first half of the ~ 100, 000 point run, then 
a power law cooling schedule was used to bring the tem- 
perature down to T = 1 during the second half of the 
run. The improvement in the signal-to-noise during the 
various stages of a search applied to MLDC 1.3.1 training 
data are shown in Figure [3) 

In tests on the MLDC training data the search strat- 
egy outlined above was able to successfully recover signals 
from the 1.3.1, 1.3.3, IB. 3.1, and IB. 3. 2 data sets (the 



1.3.4 and 1.3.5 data sets were not studied since they in- 
volve lower mass central black holes, which pushes up the 
cost of generating the waveforms). In the cases where the 
search failed, the chains ended up on secondary maxima 
of the posterior. For MLDC challenge IB. 3 the MHMC 
algorithm was run on the blind data sets for IB. 3.1, 
IB. 3. 2 and IB. 3. 3. The search performed very well in 
challenge IB. 3.1, and returned the first (to our knowl- 
edge) perfect blind recovery of a simulated EMRI signal. 
Figure f?] shows the recovered marginalized posterior dis- 
tributions for the blind challenge IB. 3.1 data set. Note 
that the recovered parameter distributions overlap with 
the injected parameters for all 14 source parameters. 



Unfortunately there was little time left to run on the 
other data sets, and the initial random restart phase of 
the searches had to be cut short. The parameters re- 
turned for IB. 3. 2 and IB. 3. 3 corresponded to secondary 
modes of the posterior. Comparing the recovered param- 
eters to the injected source parameters the cause of the 
problem became clear: the best fit templates had wave- 
forms that were offset by the side-band frequency. At 
any instant of time, the harmonics of the Barack-Cutler 
kludge waveforms have frequencies f mn = nv+2f^+mf ai 
where n, m are integers, v is the azimuthal orbital fre- 
quency, /^ is the periapse precession frequency and f a is 
the orbital plane precession frequency. It is difficult to 
get a good match if v has been misidentified as this pa- 
rameter plays a key role in the rate of orbital evolution. 
But the match remains high if fa — > fa ± / Q /2, which is 
what occurred in the IB. 3. 2 and IB. 3. 3 searches. 



Since submitting the results of the MLDC IB. 3 blind 
challenge searches, the MHMC algorithm has been sig- 
nificantly improved by incorporating several "island hop- 
ping" moves, including some that deal with the sideband 
misidentification problem described above. The way that 
large jumps - such as full range draws from the priors and 
island hops - are implemented has also been significantly 
improved. In the past when a large jump was proposed 
the Hastings ratio was computed and the move was ac- 
cepted or rejected according to the usual Metropolis re- 
jection procedure. Now the large jumps are replaced by a 
reconnaissance search that works as follows: first a large 
jump is proposed, but rather than immediately accepting 
or rejecting the move, a short search of the new region 
is performed, and the Metropolis rejection is applied to 
the end point of this search. The reconnaissance proce- 
dure significantly improves the odds of the new location 
being accepted, and is especially useful for island hop- 
ping moves where the location of the adjacent islands is 
only approximately known. Since the goal of the short 
reconnaissance search is to climb the nearest hill, gra- 
dient based methods such as the Nelder-Mead simplex 
algorithm [2J|, or Langevin MCMC ^ are used. With 
these improvements the algorithm robustly recovers all 
the MLDC 1.3 and 1B.3 signals. 
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III. DISCUSSION 
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FIG. 4: The recovered marginalized posterior distributions for 
the blind challenge IB. 3.1 data set. The solid (blue) vertical 
lines indicate the injected source parameters and the dotted 
(black) vertical lines indicate the recovered source parameters. 
Note that the distribution functions for the recovered signal 
parameters overlap the injected signal parameters for all 14 
parameters, which shows that the signal was recovered to the 
accuracy allowed by the instrument noise. 



We now have an algorithm that is able to detect iso- 
lated, bright (SNR > 50) EMRI signals in stationary, 
Gaussian instrument noise. The next step is to develop 
the algorithm further so that it can detect weaker signals 
in data sets containing multiple EMRI signals (such as 
in the MLDC Challenge 3.3 data sets J4|), and eventu- 
ally, in data sets with a full galactic foreground, multiple 
massive black holes and realistic instrument noise. There 
is reason to be optimistic that all of these challenges can 
be met. Signals have been correctly identified in subsets 
of the full data stream with SNRs as low as 15, and since 
the search uses matched filtering, it should be robust in 
the presence of signals from other sources. 
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